Welcome to another tutorial for this class, COMP/STAT 112: Introduction to Data Science! It will be similar to the others, including demo videos and files embedded in this document and practice problems with hints or solutions at the end. There are some new libraries, so be sure to install those first.
As most of our files do, we start this one with three R code chunks: 1. options, 2. libraries and settings, 3. data.
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
warning = FALSE)
library(tidyverse) # for data cleaning and plotting
library(googlesheets4) # for reading googlesheet data
library(lubridate) # for date manipulation
library(openintro) # for the abbr2state() function
library(palmerpenguins)# for Palmer penguin data
library(maps) # for map data
library(ggmap) # for mapping points on maps
library(gplots) # for col2hex() function
library(RColorBrewer) # for color palettes
library(sf) # for working with spatial data
library(leaflet) # for highly customizable mapping
library(ggthemes) # for more themes (including theme_map())
library(plotly) # for the ggplotly() - basic interactivity
library(gganimate) # for adding animation layers to ggplots
library(transformr) # for "tweening" (gganimate)
library(shiny) # for creating interactive apps
gs4_deauth() # To not have to authorize each time you knit.
theme_set(theme_minimal())
# Lisa's garden data
garden_harvest <- read_sheet("https://docs.google.com/spreadsheets/d/1DekSazCzKqPS2jnGhKue7tLxRU3GVL1oxi-4bEM5IWw/edit?usp=sharing") %>%
mutate(date = ymd(date))
data("penguins")
After this tutorial, you should be able to do the following:
Add basic interactivity to a ggplot2 plot using ggplotly().
Add animation layers to plots using gganimate functions.
Create a shiny app that requires inputs.
Publish a shiny app to shinyapps.io.
plotlyProbably the easiest way to add interactivity to a plot created with ggplot2 is by using the ggplotly() function from the plotly library. The plotly package can do A LOT more than what we’ll cover in this course as it is a plotting framework if its own. But, it can do a lot with just that one function.
Let’s look at an example. In the code below, I compute the cumulative harvest in pounds by vegetable and create a bar graph. I save the graph and print it out. The code and graph should be familiar.
veggie_harvest_graph <- garden_harvest %>%
group_by(vegetable) %>%
summarize(total_wt_lbs = sum(weight)*0.00220462) %>%
ggplot() +
geom_col(aes(x = total_wt_lbs,
y = fct_reorder(vegetable,
total_wt_lbs,
.desc = FALSE))) +
labs(title = "Total Harvest by vegetable (lb)",
x = "",
y = "")
veggie_harvest_graph
Now, we ploty-ify it!
ggplotly(veggie_harvest_graph)
The labeling is fairly ugly in the graph above. I can fix some of that by editing my original plot. In the code below, I add a text aesthetic, which will be used in ggplotly() to display the vegetable name, and use tooltip to tell it the aesthetics to display when scrolling over the graph.
veggie_harvest_graph2 <- garden_harvest %>%
group_by(vegetable) %>%
summarize(total_wt_lbs = sum(weight)*0.00220462) %>%
ggplot() +
geom_col(aes(x = total_wt_lbs,
y = fct_reorder(vegetable,
total_wt_lbs,
.desc = FALSE),
text = vegetable)) +
labs(title = "Total Harvest by vegetable (lb)",
x = "",
y = "")
ggplotly(veggie_harvest_graph2,
tooltip = c("text", "x"))
This works for many different types of plots created with ggplot2.
In this exercise, choose 2 graphs you have created for ANY assignment in this class and add interactivity using the ggplotly() function.
gganimateThe gganimate package works well with ggplot2 functions by providing additional grammar that assists in adding animation to the plots. These functions get added as layers in ggplot(), just like you are used to adding geom_*() layers and other layers that modify the graph.
From Thomas Pedersen’s documentation, here are the key functions/grammar of the package:
transition_*() defines how the data should be spread out and how it relates to itself across time (time is not always actual time).view_*() defines how the positional scales should change along the animation.shadow_*() defines how data from other points in time should be presented in the given point in time.enter_*()/exit_*() defines how new data should appear and how old data should disappear during the course of the animation.ease_aes() defines how different aesthetics should be eased during transitions.You only need a transition_*() or view_*() function to add animation. This tutorial focuses on three transition_*() functions: transition_states(), transition_time(), and transition_reveal().
gganimateggplot()geom_*() layers.gganimate transition_*() layergganimate options, which may include making some changes in the ggplot() code.transition_*() functionsThe following image, taken from the gganimate cheatsheet, gives a nice overview of the three functions.
transition_states()This transition is used to transition between distinct stages of the data. We will show an example of transitioning between levels of a categorical variable. We will use the garden_harvest dataset and will follow the steps outlined above for creating an animated plot.
First, we create a dataset of daily tomato harvests in pounds for each variety of tomato. We add day of week and reorder variety from most to least harvested.
daily_tomato <- garden_harvest %>%
filter(vegetable == "tomatoes") %>%
group_by(variety, date) %>%
summarize(daily_harvest = sum(weight)*0.00220462) %>%
mutate(day_of_week = wday(date, label = TRUE)) %>%
ungroup() %>%
mutate(variety = fct_reorder(variety, daily_harvest, sum, .desc = TRUE))
daily_tomato
Next, we create a jittered scatterplot of daily harvest by day of week. We facet the plot by variety.
daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week)) +
geom_jitter() +
facet_wrap(vars(variety)) +
labs(title = "Daily tomato harvest",
x = "",
y = "")
Now, instead of looking at the data by faceting, we will use animation and transition by variety. This code takes a while to run. And the animation shows up over in the Viewer in the lower right-hand pane, rather than in the preview below the code chunk.
daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week)) +
geom_jitter() +
labs(title = "Daily tomato harvest",
x = "",
y = "") +
transition_states(variety)
Because it takes a while to create the animation, you don’t want to recreate it each time you knit your file. So, in the code chunk where you create the animation, add eval=FALSE to the code chunk options (ie. inside the curly brackets next to the lowercase r).
Then, save the gif using the anim_save() function, like in the code below. The name in quotes is the name of the file that will be created, which needs to end in .gif. This will automatically save your most recent gganimate plot. So, be sure to run the code right after you create the animation. Alternatively, you can save your gganimate, say you called it plot1 and do anim_save(plot_1, "tomatoes1.gif"). This will be saved to your working directory. If you are working in a project (hopefully the one linked to your GitHub repo, right?), then this will go to the main folder for the project if that is where the .Rmd file is located.
anim_save("tomatoes1.gif")
Then, load the file back in using the following code. You can add echo=FALSE to the code chunk options to omit displaying the code.
knitr::include_graphics("tomatoes1.gif")
Now, let’s return to the animation that was created. There are a couple things we should fix. One is that as it animates, it looks like the observations from one variety morph into the observations from the next variety. We can fix this in two ways. One, is to color by variety:
daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week,
color = variety)) +
geom_jitter() +
scale_color_viridis_d(option = "magma") +
labs(title = "Daily tomato harvest",
x = "",
y = "",
color = "") +
theme(legend.position = "none") +
transition_states(variety)
Another, is to map variety to the group aesthetic (This is the recommended way to do it, even if we also color by variety.):
daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week,
group = variety)) +
geom_jitter() +
labs(title = "Daily tomato harvest",
x = "",
y = "") +
transition_states(variety)
Another issue is that we don’t see the variety names as it animates through. Thankfully, the various transition_*() functions create some useful variables we can use to display the names of variety. The variables created are shown below.
From transition_states() help
We can access the variables by putting them in square brackets inside a label. Below, I use the closest_state variable that is created to add the variety to the subtitle of the plot.
daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week,
group = variety)) +
geom_jitter() +
labs(title = "Daily tomato harvest",
subtitle = "Variety: {closest_state}",
x = "",
y = "") +
transition_states(variety)
There are many options we can change. Below, we make a couple more changes.
Save the animated plot as tomato_gganim and output the animation using animate() in order to control the duration (there are other options in that function, too).
Change the relative transition lengths (how long it takes to switch variety) and state lengths (how long it stays on a variety). These are relative lengths, so the transition time is twice as long as the time spent in a state.
Shrink the points as variety transitions using exit_shrink().
Color the points light blue as they enter and exit.
tomato_gganim <- daily_tomato %>%
ggplot(aes(x = daily_harvest,
y = day_of_week,
group = variety)) +
geom_jitter() +
labs(title = "Daily tomato harvest",
subtitle = "Variety: {closest_state}",
x = "",
y = "") +
transition_states(variety,
transition_length = 2,
state_length = 1) +
exit_shrink() +
enter_recolor(color = "lightblue") +
exit_recolor(color = "lightblue")
animate(tomato_gganim, duration = 20)
transition_time()This transition is used to transition between distinct states in time. We will show an example of transitioning over harvest dates in the garden_harvest dataset. We will follow the steps outlined earlier for creating an animated plot.
First, we create a dataset of daily harvest in pounds for a subset of four vegetables.
daily_harvest_subset <- garden_harvest %>%
filter(vegetable %in% c("tomatoes", "beans",
"peas", "zucchini")) %>%
group_by(vegetable, date) %>%
summarize(daily_harvest_lb = sum(weight)*0.00220462)
daily_harvest_subset
Then, we create a static plot, coloring the points differently and assigning different shapes to distinguish the various green colors.
daily_harvest_subset %>%
ggplot(aes(x = date,
y = daily_harvest_lb,
color = vegetable,
shape = vegetable)) +
geom_point() +
labs(title = "Daily harvest (lb)",
x = "",
y = "",
color = "vegetable",
shape = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "top",
legend.title = element_blank())
Now we animate the plot, transiting over time by date.
daily_harvest_subset %>%
ggplot(aes(x = date,
y = daily_harvest_lb,
color = vegetable,
shape = vegetable)) +
geom_point() +
labs(title = "Daily harvest (lb)",
x = "",
y = "",
color = "vegetable",
shape = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "top",
legend.title = element_blank()) +
transition_time(date)
Now, let’s try adding some other features:
Keep a little history of the data via shadow_wake()
Fade the old data points out via exit_fade()
Add a date subtitle using the frame_time variable created from transition_time().
daily_harvest_subset %>%
ggplot(aes(x = date,
y = daily_harvest_lb,
color = vegetable,
shape = vegetable)) +
geom_point() +
labs(title = "Daily harvest (lb)",
subtitle = "Date: {frame_time}",
x = "",
y = "",
color = "vegetable",
shape = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "top",
legend.title = element_blank()) +
transition_time(date) +
shadow_wake(wake_length = .3) +
exit_fade()
transition_reveal()This transition allows you to let data gradually appear. We will show an example of building up the cumulative harvest data over harvest dates using the garden_harvest dataset. We will follow the steps outlined earlier for creating an animated plot.
First we create a dataset of cumulative harvest by date for a subset of vegetables.
cum_harvest_subset <- garden_harvest %>%
filter(vegetable %in% c("tomatoes", "beans",
"peas", "zucchini")) %>%
group_by(vegetable, date) %>%
summarize(daily_harvest_lb = sum(weight)*0.00220462) %>%
mutate(cum_harvest_lb = cumsum(daily_harvest_lb))
cum_harvest_subset
Next, we create a static plot of cumulative harvest, coloring the lines by vegetable.
cum_harvest_subset %>%
ggplot(aes(x = date,
y = cum_harvest_lb,
color = vegetable)) +
geom_line() +
labs(title = "Cumulative harvest (lb)",
x = "",
y = "",
color = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "top",
legend.title = element_blank())
And now, add animation!
cum_harvest_subset %>%
ggplot(aes(x = date,
y = cum_harvest_lb,
color = vegetable)) +
geom_line() +
labs(title = "Cumulative harvest (lb)",
x = "",
y = "",
color = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "top",
legend.title = element_blank()) +
transition_reveal(date)
And now let’s do a couple things to improve the plot:
Remove the legend and add text that shows vegetable name on the plot (I love this!).
Add date to the subtitle.
cum_harvest_subset %>%
ggplot(aes(x = date,
y = cum_harvest_lb,
color = vegetable)) +
geom_line() +
geom_text(aes(label = vegetable)) +
labs(title = "Cumulative harvest (lb)",
subtitle = "Date: {frame_along}",
x = "",
y = "",
color = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "none") +
transition_reveal(date)
We could have used this same data with a different type of transition. It’s always good to think about the point you are trying to make with the animation.
cum_harvest_subset %>%
ggplot(aes(x = date,
y = cum_harvest_lb,
color = vegetable)) +
geom_line() +
labs(title = "Cumulative harvest (lb)",
subtitle = "Vegetable: {closest_state}",
x = "",
y = "",
color = "vegetable") +
scale_color_manual(values = c("tomatoes" = "darkred",
"beans" = "springgreen4",
"peas" = "yellowgreen",
"zucchini" = "darkgreen")) +
theme(legend.position = "none") +
transition_states(vegetable)
COMING SOON!
gganimate intro slides by Katherine Goode (she animates bats flying!)
gganimate by Thomas Pedersen - scroll down to the bottom
Pedersen introductory vignette - gives a brief intro to what each of the key functions do
gganimate wiki page - most of this is currently under development but there’s some good examples
shiny